Skip to content

chore: Eliminate unnecessary reconciliations after progressive rollout failure#976

Merged
juliev0 merged 5 commits intomainfrom
reduce-reconciliations
Mar 19, 2026
Merged

chore: Eliminate unnecessary reconciliations after progressive rollout failure#976
juliev0 merged 5 commits intomainfrom
reduce-reconciliations

Conversation

@juliev0
Copy link
Collaborator

@juliev0 juliev0 commented Mar 17, 2026

Fixes #773

Modifications

After we assess a progressive rollout failure, we don't need to be requeing every 20 seconds like we have been. This wastes cpu cycles.

Once there is an update to the rollout spec, we will reconcile again. There's nothing to do in the meantime.

Verification

I performed the following tests with both pipelinerollout, isbservicerollout, and monovertexrollout:

  • Update to a failed spec and observe the rollout fail
  • Observe that the numaplane log stops showing reconciliations
  • Observe that pipelines are correctly scaled during failure
  • Fix the rollout spec and observe that everything works as usual as far as progressive success case
  • Observe that the promoted child is now running at full scale

Backward (and forward) incompatibilities

N/A

juliev0 added 5 commits March 16, 2026 21:54
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Branch-Creation-Time: 2026-03-16T21:52:40+0000
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Signed-off-by: jvogelman <julie_vogelman@intuit.com>

// if we still have interstepbufferservices that need deleting, or if we're in the middle of an upgrade strategy, then requeue
if !allDeleted || inProgressStrategy != apiv1.UpgradeStrategyNoOp {
if !allDeleted || inProgressStrategy == apiv1.UpgradeStrategyPPND {
Copy link
Collaborator Author

@juliev0 juliev0 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of Progressive, we have other logic which is causing us to requeue if we're in the middle of the assessment - we don't need this

requeueDelay = progressiveRequeueDelay
}
if assessmentComplete {
if !failed {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github is making it seem like the code in this block failed, but it hasn't. Only the if done got changed to:

if assessmentComplete {
			if !failed {

}

// requeue using the provided delay
return progressiveRequeueDelay, nil
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requeue if we're mid-assessment

requeueDelay = common.DefaultRequeueDelay

} else {
requeueDelay = progressiveRequeueDelay
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requeueing is taken care of at the top where we now do this:

if progressiveRequeueDelay > 0 {
		requeueDelay = progressiveRequeueDelay
}

@juliev0 juliev0 marked this pull request as ready for review March 17, 2026 20:53
@juliev0 juliev0 requested a review from afugazzotto as a code owner March 17, 2026 20:53
Copy link
Contributor

@dpadhiar dpadhiar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@juliev0 juliev0 merged commit 22bbfe2 into main Mar 19, 2026
64 of 66 checks passed
@juliev0 juliev0 deleted the reduce-reconciliations branch March 19, 2026 19:25
juliev0 added a commit that referenced this pull request Mar 19, 2026
…t failure (#976)

Signed-off-by: jvogelman <julie_vogelman@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid requeueing if we fail progressive upgrade

2 participants